Automatic Speaker and Language Identification
نویسنده
چکیده
This thesis deals with the problem of automatic language identification (LID) and automatic speaker identification (SID) given the speech signal as input. Both researches have received renewed interest due to heightened homeland security awareness, e.g. in the use of speaker’s voice print for biometric identification, language identification for the classification of speech archives and call-scanning. In addition, technological progress on speaker and language identification (SLID) research will also enhance the robustness of automatic speech recognition. The technologies used in SLID applications are closely related as both apply statistical models to identify speaker and the language identity. The key to solve the SLID problem is the detection and exploitation of differences between languages and speakers. The speaker and languages differ from one another along many dimensions, these include, phoneme inventory, phonotactic, prosodics, syllable structure, lexical words and grammar. These differences must be identified and extracted from raw speech data to successfully attack the SLID issues. While the progress of speaker and language identification research in the last few years has been heartening, significant improvements is still required for real-world environment applications as available data from these environments is often noisy and/or has short duration. Although human can normally successfully apply SLID using these data, current systems are unable to perform as robustly. This indicates that current systems have not fully exploited all available information in the recording and improvements can be made. The first year of the research focused on general literature study and investigating different approaches in language identification techniques. The research is focused on investigating the discriminative features and their fusion in language identification. We
منابع مشابه
Segmentation of speech for speaker and language recognition
Current Automatic Speech Recognition systems convert the speech signal into a sequence of discrete units, such as phonemes, and then apply statistical methods on the units to produce the linguistic message. Similar methodology has also been applied to recognize speaker and language, except that the output of the system can be the speaker or language information. Therefore, we propose the use of...
متن کاملAutomatic Language Identification with Discriminative Language Characterization Based on SVM
Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by class...
متن کاملImprovements in Non-Verbal Cue Identification Using Multilingual Phone Strings
Today’s state-of-the-art front-ends for multilingual speechto-speech translation systems apply monolingual speech recognizers trained for a single language and/or accent. The monolingual speech engine is usually adaptable to an unknown speaker over time using unsupervised training methods; however, if the speaker was seen during training, their specialized acoustic model will be applied, since ...
متن کاملThis is a placeholder. Final title will be filled later
Current Automatic Speech Recognition systems convert the speech signal into a sequence of discrete units, such as phonemes, and then apply statistical methods on the units to produce the linguistic message. Similar methodology has also been applied to recognize speaker and language, except that the output of the system can be the speaker or language information. Therefore, we propose the use of...
متن کاملSession variability compensation in speaker and language recognition
This report summarises the research work performed by the author in order to start his Ph.D Thesis which is based on robust automatic speaker and language recognition. One of the main causes of errors in automatic speaker and language recognition systems is due to intrinsic variability between sessions of a same speaker. This variability known as session or channel variability is caused by seve...
متن کامل